Search CORE

277 research outputs found

RLE Plots: Visualising Unwanted Variation in High Dimensional Data

Author: Gandolfo Luke C.
Speed Terence P.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/04/2017
Field of study

Unwanted variation can be highly problematic and so its detection is often crucial. Relative log expression (RLE) plots are a powerful tool for visualising such variation in high dimensional data. We provide a detailed examination of these plots, with the aid of examples and simulation, explaining what they are and what they can reveal. RLE plots are particularly useful for assessing whether a procedure aimed at removing unwanted variation, i.e. a normalisation procedure, has been successful. These plots, while originally devised for gene expression data from microarrays, can also be used to reveal unwanted variation in many other kinds of high dimensional data, where such variation can be problematic.Comment: 9 pages, 3 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Melbourne Institutional Repository

A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

Author: Bengtsson Henrik
Speed Terence P.
Wirapati Pratyaksha
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time. Availability: A bounded-memory implementation that can process any number of arrays is available in the open source R package aroma.affymetrix. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

RERO DOC Digital Library

EXPLORATION, NORMALIZATION, AND GENOTYPE CALLS OF HIGH DENSITY OLIGONUCLEOTIDE SNP ARRAY DATA

Author: Carvalho Benilton
Irizarry Rafael A
Speed Terence P.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 03/07/2006
Field of study

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package

Collection Of Biostatistics Research Archive

Transcription factor binding site prediction with multivariate gene expression data

Author: Speed Terence P.
Wildermuth Mary C.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

Multi-sample microarray experiments have become a standard experimental method for studying biological systems. A frequent goal in such studies is to unravel the regulatory relationships between genes. During the last few years, regression models have been proposed for the de novo discovery of cis-acting regulatory sequences using gene expression data. However, when applied to multi-sample experiments, existing regression based methods model each individual sample separately. To better capture the dynamic relationships in multi-sample microarray experiments, we propose a flexible method for the joint modeling of promoter sequence and multivariate expression data. In higher order eukaryotic genomes expression regulation usually involves combinatorial interaction between several transcription factors. Experiments have shown that spacing between transcription factor binding sites can significantly affect their strength in activating gene expression. We propose an adaptive model building procedure to capture such spacing dependent cis-acting regulatory modules. We apply our methods to the analysis of microarray time-course experiments in yeast and in Arabidopsis. These experiments exhibit very different dynamic temporal relationships. For both data sets, we have found all of the well-known cis-acting regulatory elements in the related context, as well as being able to predict novel elements.Comment: Published in at http://dx.doi.org/10.1214/10.1214/07-AOAS142 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn

GENE SET ENRICHMENT ANALYSIS MADE SIMPLE

Author: Irizarry Rafael A.
Speed Terence P.
Wang Chi
Zhou Yun
Publication venue: Collection of Biostatistics Research Archive
Publication date: 21/04/2009
Field of study

Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis (GSEA), is based on a statistical test known for its lack of sensitivity. In this paper we compare the performance of a simple alternative to GSEA.We find that this simple solution clearly outperforms GSEA.We demonstrate this with eight different microarray datasets

Crossref

PubMed Central

Collection Of Biostatistics Research Archive

Global analyses of mRNA translational control during early Drosophila embryogenesis

Author: Ahn Soyeon
Qin Xiaoli
Rubin Gerald M
Speed Terence P
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

The polysomal profiles of over 15,000 transcripts during the first ten hours after egg laying have been determined

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository

Proximal genomic localization of STAT1 binding and regulated transcriptional activity

Author: Hilton Douglas J
Smyth Gordon K
Speed Terence P
Wormald Samuel
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Signal transducer and activator of transcription (STAT) proteins are key regulators of gene expression in response to the interferon (IFN) family of anti-viral and anti-microbial cytokines. We have examined the genomic relationship between STAT1 binding and regulated transcription using multiple tiling microarray and chromatin immunoprecipitation microarray (ChIP-chip) experiments from public repositories. RESULTS: In response to IFN-γ, STAT1 bound proximally to regions of the genome that exhibit regulated transcriptional activity. This finding was consistent between different tiling microarray platforms, and between different measures of transcriptional activity, including differential binding of RNA polymerase II, and differential mRNA transcription. Re-analysis of tiling microarray data from a recent study of IFN-γ-induced STAT1 ChIP-chip and mRNA expression revealed that STAT1 binding is tightly associated with localized mRNA transcription in response to IFN-γ. Close relationships were also apparent between STAT1 binding, STAT2 binding, and mRNA transcription in response to IFN-α. Furthermore, we found that sites of STAT1 binding within the Encyclopedia of DNA Elements (ENCODE) region are precisely correlated with sites of either enhanced or diminished binding by the RNA polymerase II complex. CONCLUSION: Together, our results indicate that STAT1 binds proximally to regions of the genome that exhibit regulated transcriptional activity. This finding establishes a generalized basis for the positioning of STAT1 binding sites within the genome, and supports a role for STAT1 in the direct recruitment of the RNA polymerase II complex to the promoters of IFN-γ-responsive genes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

University of Queensland eSpace